home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-07-11 | 72.6 KB | 1,300 lines |
- Guidelines to use 8-bit character codes
- Version 2. July 1992.
-
- A. Pirard
- University of Liege
- Belgium
-
- Important preliminary notice
-
- This file contains translation tables between proprietary codes and
- ISO codes. As indicated, some translate several characters arbitrarily by
- lack of a known definition of this translation by the owner of the code
- (constructor). So, watch this space for an update indicating any news as
- I get to know it.
-
- Since version 1:
-
- - At the request of SHARE, IBM has:
- - defined a new code page 1047 compatible with the de-facto EBCDIC.
- - defined a new code page 819 corresponding to ISO 8859-1.
- - published a document listing the translation between 819 and the SAA
- code pages 850 and 500, from which other translation may be deduced.
- - see summary of changes at the top of the paragraph about IBM.
- - So, the translation tables between 8859-1 and PC codes have been
- changed accordingly.
- - The translation of the Macintosh code has been changed to account for 6
- ISO characters that appear in an Islandic Macintosh code and translate
- arbitrarily otherwise. This pushed away 4 other arbitrary translations.
- - IBM code pages 850 and 1047 are considered the preferred tables; other
- translations were moved to a secondary file to reduce size.
-
- Changes to the text:
- - more complete explanation of keyboard handling for the PC.
- - updating explanations to follow evolution of usage and terminology.
- - minor revisions for clarity.
-
-
-
- Introduction.
-
- In the course of my work in communications in a French-speaking
- environment -- writing programs, installing but mostly having to adapt
- others -- I discovered facts, notions, techniques and data related to
- international characters usage. Many English-speaking programmers are
- willing to extend the scope of their software to what is for them
- "foreign languages". Discussion with them is often lengthy to convey
- numerous details that are obvious to one and obscure to the other. Trying
- to help without repeating the same words all over again is the reason of
- this document.
-
- This text is restricted to the problem of the character codes used
- in data. Yet, I should mention briefly that isolating from executable
- code the user interface messages is a real plus. These messages should be
- easily translatable by anyone who knows the language, even if source is
- unavailable. Anything similar to the Macintosh resources is ideal. To
- avoid making feel this goal too easy, I must warn than phrases in many
- languages are longer than English and that the order of inserts may vary
- depending on grammar.
-
- I am much indebted to the people I met on networks and on the
- mailing list ISO8859@JHUVM for their discussion (especially Edwin Hart
- HART@APLVM, with his SHARE White Paper to IBM). The international
- community owes much to the Kermit developpers group led by Christine
- Gianone and fed by Frank da Cruz and many volunteers who produced several
- Kermit versions using the principles described in this document and store
- character codes related data on WATSUN.CC.COLUMBIA.EDU:kermit/charsets. I
- should also thank many other people for their interest, especially those
- who adapted their programs, but I am truly unable to mention them all.
- You will know when some ISO 8859 setting catches your eye.
-
- At the risk of a lack of justification, I have made every effort to
- keep this text as concise as possible to spare your time. One will have
- to think beyond the text in some places. On the other hand, please excuse
- if some paragraphs contain evidence: it is sometimes needed. Also
- remember that English is not my mother language...
-
-
-
- A language among others: French.
-
- Like many other languages, French uses characters not found in
- English. It likes to adorn them with diacritics (accents). Other
- languages use other characters, from a few like German to totally
- different like Russian and Greek, or even the right to left Arabic and
- Hebrew. To the question: "could you do without them?", I like to reply
- that forgetting them in "a la francaise" makes it mean "has the French
- girl". "a" must take a grave accent to distinguish the preposition from
- the verb and "c" takes a cedilla. French without diacritics is certainly
- not unreadable, rarely ambiguous with the aid of context (i. e. to humans
- but not to computers), but just as unpleasant as all-uppercase text and
- difficult to read, stumbling on most missing accents, like proof-reading
- one's kid dictation. In the general case, many languages cannot do
- without their own characters anyway.
-
-
-
- Terms.
-
- A "character" is what one writes down on paper. A "code" is a
- computer representation of a set of characters that we can see as
- associated to numbers called "code points". A code usually includes
- "control characters" for which a graphic representation does not normally
- exist, because they are only used to control the operation of hardware or
- have special meanings to programs.
-
-
-
- 7-bit character codes
-
- ASCII (ANSI X3.4) was defined as a 7-bit code for English at a time
- when hardware was really hard and expensive. To allow the use of some of
- those particular characters that other languages need, it was later
- decided that a defined subset (the least used ones) could be replaced.
- This is ISO 646. Several language had the subset replaced with their own
- characters. This is what can be done with Escape sequence of Epson
- printers to switch to a national language. ANSI X3.4 became an instance
- of ISO 646. But, for some languages like French, the amount of characters
- that can be replaced is not enough and text processing of these days made
- extensive use of backspaces and overstrikes for the missing ones. On the
- other hand, replacing programming symbols with national characters
- introduces much confusion in programming languages, like a comment being
- terminated by its own text, and in several uses of those characters (e.g.
- in e-mail or Unix) where the national meaning clashes with the ASCII one.
-
- US EBCDIC (an IBM code) used more or less the same characters as
- ASCII, but used different code points. I should say "more and less". Some
- ASCII characters did not exist in EBCDIC (e. g. square brackets) and
- EBCDIC had ones (cent sign, not sign) that were not in ASCII. As a
- consequence, the translation between ASCII and EBCDIC was strictly
- speaking undefined, and IBM never officially defined a complete one.
- Users defined one translation which resulted in a so-called de-facto
- EBCDIC containing all the characters of ASCII, that all ASCII-related
- programs use. Albeit EBCDIC was an 8-bit code "with holes", IBM made the
- same characters replacements as ISO 646 in hardware to be used with other
- languages (but, again, as other characters were missing, this was of
- little use to French).
-
- Even though data was stored in octets, 7-bit communication line were
- used and it was (and still is) common practice for software to strip off
- the 8th bit despite a possible extension of the code, future or existing.
-
- We lived a long time of computer frustration. Is the problem solved?
-
-
-
- 8-bit character codes
-
- Storing in a database text full of "this backspace that", trying to
- sort it etc... or getting a Sterling pound bill paid in dollars because
- that's what the dollar sign is replaced with in the English version of
- ISO 646 was a real pain and an insult to the octet. It was soon realized
- that, even if text processing could cope to some extent with compound
- characters, data processing could not at all. One character must be one
- data element of constant width.
- With the era of cheaper hardware and microcomputers, manufacturers
- started to use the upper half of the 256 code points of the common 8-bit
- byte for international characters. It was one major reason of the success
- of these computers over the international place.
- But there was no standard and each did it his own way as to which
- characters and which code points to use, like to-day's DEC, Apple, Atari,
- Commodore or other less known brands. The IBM PC was built with yet
- another code that was later called "code page 437" and that everyone in
- the compatible business settled on. But IBM also built PCs with
- variations for countries using characters that were not in 437, now
- called 860, 863 and 865.
-
- There was an evident Babel and a new standard had to be set.
- National institutions and many constructors participated to produce the
- ISO 8859 standard. As 256 code points are not enough for all languages in
- the world, several "versions" of this standard exist (see below for a
- list, still evolving). ISO 8859-1 is for group 1 of Latin-based languages
- and covers Western Europe, including English, hence many major countries
- in North and South America, Australia and many others world wide.
- A new multibyte standard is being prepared: ISO 10646 -- in which
- ISO 8859-1 is a contiguous subset --, that will cover all languages in a
- single code. "Unicode" -- a code being defined by a consortium of
- manufacturers -- and ISO 10646 joined: Unicode will be a 2-byte subset of
- 4-byte ISO 10646, with the remarkable result of a single worldwide code.
- Until ISO 10646 can be used, today's hardware and software, strongly
- single-byte oriented, can easily extend the scope of a character code to
- 8 bits and one version of ISO 8859. The particular version used being
- implicit to a group of languages is sorry indeed, but it must be
- understood that it is a dramatic improvement in a country or a group of
- countries where data is implicit anyway.
- For short, I may call "ISO 8859" or simply "ISO" in the following
- text any version that a system uses at any one time, when assuming that
- the systems do not switch versions dynamically, but that the user can
- setup the choice of the version he uses, if not implied by hardware.
-
- ISO 8859 (any version) is an extension of ASCII. The upper half (in
- fact, 128-159 are reserved for more control characters) is filled with
- characters for a group of countries. The present trend to use ISO 8859 is
- certain. Version 1 is much like the previous DEC's "8-bit ASCII code",
- and VT terminals now have a setup to use 8 bits and ISO 8859 (and Escape
- sequences to switch among and display several ISO 8859 versions). Looking
- at Microsoft and Lotus international codes, one notices that they had
- soon adopted a "pre-release" of ISO 8859-1 (Microsoft calls ISO 8859-1
- "ANSI code" in their documentation of Windows). As explained below, IBM
- have adopted ISO 8859-1 their own way. X-Windows specifications (from
- MIT, of a presentation system on a remote graphic terminal) prescribe
- that ISO 8859-1 is to be used on the communication line. By mutual
- agreement, a growing number of universities and institutions exchange
- data in ISO.
-
- ISO 8859-1, Latin Alphabet 1, for Dutch, English, Faeroese, Finnish,
- French, German, Icelandic, Irish, Italian, Norwegian, Portuguese,
- Spanish, and Swedish.
- ISO 8859-2, Latin Alphabet 2. Albanian, Czech, English, German,
- Hungarian, Polish, Romanian, Serbocroation, Slovak, and Slovene.
- ISO 8859-3, Latin Alphabet 3, for Afrikaans, Catalan, English, Esperanto,
- French, Galician, German, Italian, Maltese, and Turkish.
- ISO 8859-4, Latin Alphabet 4, for Danish, English, Estonian, Finnish,
- German, Greenlandic, Lappish, Latvian, Lithuanian, Norwegian, and
- Swedish.
- ISO 8859-5, the Latin/Cyrillic Alphabet, for Bulgarian, Byelorussian,
- Macedonian, Russian, Serbocroation, and Ukrainian.
- ISO 8859-6, the Latin/Arabic Alphabet.
- ISO 8859-7, the Latin/Greek Alphabet.
- ISO 8859-8, the Latin/Hebrew Alphabet.
- ISO 8859-9, Latin Alphabet 5, for Danish, Dutch, English, Faeroese,
- Finnish, French, German, Irish, Italian, Norwegian, Portuguese, Spanish,
- Swedish, and Turkish.
-
-
-
- The "foreign" environment.
-
- So, these facts of languages have our typewriters different, and the
- computer keyboards are modelled after them. A few letters moved about,
- digits on the uppercase side, accented letters in place of programming
- symbols etc... More striking, if you pardon the pun, is that -- because
- the amount of keys is not enough for all the French characters -- some
- so-called dead-keys are used to compose accented letters by a strike of
- them followed by another letter, giving a single code point as program
- input, just like a typewriter could overtype.
-
- It must be realized that, to an international computer user, an 8-
- bit code is just as natural as the 7-bit one of English-speaking users.
- 8-bit code points "come out" some plain keys of the keyboard and are
- expected to display. If a program filters them out, this will be
- shocking. If it uses these code points for internal control functions,
- the user will be confused with "strange behavior" a US keyboard would
- never exhibit. For example, if it strips the 8th bit of a PC e-acute, it
- produces a disturbing linefeed. Or if a program decides that normal
- characters belong to the range 32-127, this will play havoc. It is worth
- checking a program with such data, that some keyboards can produce with
- alternate input.
-
- Trust little about the keyboard layout and physical scan-codes. The
- only reliable input is through the operating system or country-
- configurable keyboard driver interface. Working with physical input is
- trying to duplicate the varying and sometimes complicated logic of those
- drivers (maybe covering several keyboards per country) and heading for
- problems or incomplete coverage. Assuming that one can use transformation
- of one strike to one code point is incorrect, because of the dead-keys.
- Using the state of special keys of the PC (Shift, Ctrl, Alt etc...) to
- try to modify the meaning of what the system outputs (a usual feature of
- communication programs) is not the best idea either, because keyboard
- recorders rarely replay the shift states along with that output. And, in
- general, mixing input from different levels is unsafe: strictly speaking,
- these states are asynchronous with the input, one may read a key code
- when the shift state has disappeared. Yes, a program is usually faster
- than the user, but can one swear that a fast, long buffered auto repeat
- makes this true in all cases? Imagine your output being blocked by
- network flow control... Oh yes, it can happen.
-
- As an example, here is what can be done on that PC I know well. The
- keyboard driver outputs 2 bytes, H and L.
- When H is nonzero, it is the physical position of the key pressed;
- so, unless the documentation really wishes to refer to the key by
- position and not keycap for such things as diamond-shaped or in-a-row key
- groups, ignore this value and simply use L as final data: it is extended
- ASCII to be used as such (or, at most to go through a code translation as
- discussed hereafter). Note that different keys (different H) may produce
- the same code-point (L); e.g. L is 0 for an Alt/literal-number of the PC.
- When H is zero, a special key combination has been pressed,
- indicated by the value of L to be used to index a table of actions of the
- program. The PC defines 166 such special key combinations (0+L) and the
- intention of the application designer -- when using modifying shift
- states -- is to provide more, or, also, those the user really wants. The
- 90 values of L are probably enough additional definitions (but some or
- all of the 166 could even be redefined or even "impossible" H combined
- with a 256 L multiplicator).
- Hence, the simplest method is to assign pre-defined additional
- pseudo-scan-codes (0+L) -- and, repeat, certainly not extended ASCII code
- points -- to the actions of the program and to manage to have the
- keyboard driver produce them on any key the user chooses. Here is how to
- do that.
- Each time a key is pressed (or released), the keyboard driver -- be
- it in ROM or the keyb... driver -- calls software interrupt 15h with 4Fh
- in register AH, with carry flag set and the physical scan code of the key
- in register AL (ored with 80h when released, so that this case can easily
- be ignored). The application may intercept interrupt 15h, test for a key
- it wants together with the shift keys states it wants (safe at this
- level).
- - If AH is not 4Fh or AL is an unwanted key, the processor's flags and
- registers are left as on program entry (with carry set), and control is
- transferred to the next interrupt 15 handler (as any well-behaved
- interceptor must do); eventually, the keystroke will be used or ignored
- by someone else. Usually, this transfers to a dummy interrupt and returns
- to the keyboard driver to use the keystroke in the normal way.
- - If the key is wanted, interrupt 16 is called with AH=5 and CH+CL set to
- what is to be placed in the keyboard buffer queue to be the input to the
- application. Then, return is made to the caller with carry flag cleared
- to indicate to the keyboard driver that the keystroke is used and that it
- is to ignore it and clean up the hardware interrupt.
- - One can insert anything in the keyboard buffer, extended ASCII (PC
- code) or pseudo-scan-code: a keyboard recorder will receive that and
- replay it faithfully (but, of course, your inventions will be meaningful
- only to your own application). This is the way to even produce "Enter"
- with the right Ctrl key as IBM 3270 emulations do (if you really insist,
- I personally hate this). However, remember that inserting extended ASCII
- may be in conflict with the choice of a particular keyboard or code page
- for which it is different: again, the keyboard driver knows much better
- about that.
-
- I am no specialist of the Macintosh internals, but I guess there's a
- similar story to tell for it.
-
-
-
- 8-bit codes in communications.
-
- We now realize that exchanging data between those computers with
- proprietary 8-bit codes is to international users exactly like sending
- data from an ASCII machine to an EBCDIC one: translation has to occur
- somewhere.
- Which is to translate what to what? Communication, if to work at
- all, relies heavily on strict standards. If communication between EBCDIC
- and ASCII computers is feasible, it is because of the well known fact --
- so well one often forgets to state it -- that character on a
- communication line must be ASCII. Just imagine there would be nothing
- such. Just realize that there is no clearly spoken equivalent for
- international characters, just tacit agreement.
- It is urgently needed to stop any sorts of hacking. I know of at
- least 25 different codes with characters similar to ISO 8859-1 that a
- file receiver would have to try to detect and know if there were no rule.
- This makes over 1000 translation tables. This text advocates standard
- communication and simplicity with one code on a given computer.
-
- The only solution is to state that each and every octet of text data
- carried on a communication line cannot be anything else that an official
- standard and that, while waiting for a single multi-octet standard, each
- language uses only one standard. ISO 8859 fills this purpose and is the
- only official standard. It is already used by major firms and some
- protocols like X-Windows.
-
-
-
- Conclusions.
-
- A) An "8-bit clean" computer is one allowing characters to have the 8th
- bit set. If such a computer (like more and more Unixes of these days) is
- to choose a code, the obvious, painless one to avoid any translation is
- the standard: a version of ISO 8859. Note that such a machine becomes
- code-dependent only by 1) the system messages in the user's language and
- 2) the terminals and other peripherals used to display and enter the data
- (hence, other messages). It might seem that owing a uniform environment
- of PCs or Macs and their printers could make their code the best choice
- for a near Unix machine. On the long range, this will cause problems when
- that environment will be integrated in networking with other sites. And
- internetworking is moving fast and spreads standards. Better start right
- than have a computerfull of data to translate one day. By the way, note
- that most terminal emulators already use ISO 8859-1.
-
- B) If a computer is forced to continue using a code different from but
- with a character set similar to a version of ISO 8859, it must behave
- with regard of what it sends on and receives from communication lines as
- if it were using that version of ISO. This means that the key feature of
- protocols (like file transfer in text mode or electronic mail) is to
- implement translation of the data that this protocol exchanges with the
- communication line. This applies to both services provided by a host and
- terminal (client) functions provided by stations. In normal usage, this
- translation is expected to always be to ISO 8859, but, to ease the
- transition period, the translation may be selectable, especially to
- revert to the compatible case of null translation. However, the user
- should be advised that the preferred translation is to ISO (and that it
- in no way impairs communication restricted to ASCII).
- In such a case, a requirement is to define a "best fit" translation
- between the proprietary code and that ISO version for text file transfer.
- Characters identical in both sets produce a meaningful code point
- translation; the translation of other characters is arbitrary but must be
- well defined. The important point is that this translation must be one to
- one and invertible for all the 256 characters (that is, each character
- translates to a different one and the reverse translation returns the
- original value). The translation of the lower half of an extension of
- ASCII is null. This kind of translation is valuable even if translating
- characters to totally different ones in operations like file transfer,
- instead of trying to obtain look-alike or multiple ones. The reason is
- that doing otherwise may permanently corrupt data that cannot be fully
- processed later, be it just to return or forward it. It is better to
- obtain partially meaningless data (in appearence) and to be able to
- process it locally (e.g. print it more meaningfully) than to assume that
- the goal of network transfer is final usage. Note that if a system does
- not use a subset of the code points, it may have to receive files from
- systems that do.
- A main difficulty is that this translation should be unique for a
- given system, so that two computers running this system be able to
- exchange data of their own code under the above rules (translation to
- ISO) without data loss. It is clear that a proprietary communication
- protocol (like NETBIOS) can use the proprietary code without translation.
- (Yet, one day, that protocol (like NETBIOS) may well extend to other
- computers, causing difficulty.) But, in internetworking, and especially
- with electronic mail, it should not be expected from a computer to
- necessarily know the type of machine (hence code) of the other party.
-
- The constructor (the owner of the proprietary code) should define
- this translation precisely but sometimes fails to do so. In consequence,
- one goal of this document is to suggest one as widely as possible.
-
- Terminal emulation deserves a special discussion. For communication
- programs (usually providing VT100 terminal emulation), it is not
- necessary to provide the full features of the higher VT models that can
- switch character codes to achieve international characters support.
- Moreover, it is not desirable to ask that the hosts a terminal is
- connected to have to send character codes switching escape sequences in
- order to initiate the use of national characters. What is needed is just
- to be able to setup terminal mode with an initial state of what display
- the GR code points (values above 127). This way, using ISO 8859 will only
- be a "matter of fact" to the 8-bit-clean host and neither has to know
- about code switching. This is especially true when the only possible
- display a microcomputer can achieve is by translating ISO from the line
- to its own similar character set, like the IBM PC or an Apple Macintosh
- with standard fonts. In short, VT100 emulation is sufficient, but with
- added translation before display and from the keyboard.
-
- Now, one important remark about implementing translation with a
- proprietary code in a communication program. Two methods are possible.
- A) Text is translated at the communication line interface. Hence, the
- proprietary code is used for text in computer memory.
- B) Text is translated at the other system interface (screen, keyboard,
- file). Hence, ISO 8859 is used for text in memory.
- The choice of the method depends on a number of factors.
- - If the communication protocol is such that all data on the line is
- text, method A is the easiest. If there is a mix of text and binary and
- an minimum of interface points where text can be translated is not found,
- then method B should be considered.
- - If the system interfaces can be well localized (e. g. routines in the
- program to interface the screen, keyboard and files of the PC), method B
- is easy. Else (e. g. the Macintosh where multiple system interface exist
- with text as parameters) method A may be better (unless maybe, on the
- Mac, ISO fonts were used just for this reason, not very practical except
- for a terminal emulation program).
- - If the proprietary code is not unique (like multiple in use on the PC),
- method B is best unless an interface is built to translate the internal
- program messages to the current code.
- - Using ISO in memory makes the program messages more portable.
- Two typical examples: a terminal emulation with file transfer
- (Kermit style) on a PC used method B with advantages; a file transfer
- program (TCP/IP FTP) on a Mac used method A with great simplicity (e. g.
- the filenames in the FTP dialog were translated altogether when method B
- would have required to act at various points of the Macintosh API).
-
-
-
- Moral.
-
- I can hear those having read this far say they did not suspect such
- problems. You will now understand why it is important to write 8-bit
- clean software, to use a single code on one computer, that by far the
- most interesting to-day is ISO 8859 (the Unix advice) and why
- applications running on inconvertible systems have to translate text.
-
-
- IBM and ISO 8859-1 (general, see details before the IBM tables)
-
- For the PC, IBM has now adopted the character set of ISO 8859-1 with
- a different code. This was done by replacing some characters of the
- original PC code, now called code page 437, to obtain the full character
- set of ISO 8859-1. This new code is called "code page 850" and IBM sees
- it as the preferred code page for all Latin1 customers (it's their
- default code for OS/2). See the appendix D of the "DOS reference manual"
- for a description of 850 and the code pages it may replace: 437, 860, 863
- and 865. Beware, the yen, cent, and two paragraph symbols that existed in
- 437 were moved in 850. When one builds a translation table between 850
- and ISO 8859-1, 32 characters of 850, mainly box-drawing, are left to be
- assigned to the 32 control characters 80-9F of ISO.
-
- For the EBCDIC mainframes, IBM decided that, because terminals were
- already using the ISO-646-like replacements to the US EBCDIC, they had to
- stay compatible. They extended each such "national EBCDIC" to "country
- extended code pages". Thus, there are as many EBCDICs as versions of ISO
- 646 (what ISO 8859 is trying to avoid). None of the original CECPs was
- compatible with the de-facto EBCDIC. Lately, IBM defined CECP 1047 which
- is compatible with (an extension of) the de-facto US EBCDIC (see
- discussion below). In consequence, I consider that CECP 1047 is the most
- interesting EBCDIC code to use, because of the compatibility with the
- vast software base.
-
- CECP 1047 "internationalized industry standard" (my terms)
- CECP 037 for US, Canada-French, Netherlands, Portugal.
- CECP 273 for Germany.
- CECP 277 for Denmark and Norway.
- CECP 278 for Finland and Sweden.
- CECP 280 for Italy.
- CECP 284 for Latin America and Spain.
- CECP 285 for United Kingdom.
- CECP 297 for France.
- CECP 500 for Belgium, Switzerland-French and Switzerland-German.
-
- Like 850, all these codes contain all the characters of ISO 8859-1.
-
- Only the recent CECP 1047 is compatible with a de-facto standard
- EBCDIC, corresponding to a de-facto ASCII/EBCDIC translation, that a huge
- amount of products settled on long ago, including software from IBM:
- - all compilers from IBM or others: C, REXX, PL/I, Pascal, for those
- sensitive to the differences in code points,
- - File transfer programs like Kermit, PCTERM, and IBM TCP/IP,
- - In fact, the whole of IBM TCP/IP,
- - Terminal emulation: TTY line mode or 3270 emulation by the 7171,
- - ASCII tapes translation,
- - Products to translate ASCII to EBCDIC on a mainframe: ARCUTIL ...
- - Products that should produce ASCII, but produce EBCDIC because data
- goes through EBCDIC/ASCII translation: e. g. SAS output for Tektronix,
- - Products that convert this output anyway, because the expected
- EBCDIC/ASCII translation does not occur: LINEMODE through the 7171
- transparent mode,
- - Similarly, TPRINT to print in this transparent mode
- - Certainly many other products I don't know of or I forget, because, as
- you see, the de-facto EBCDIC snowballs from one use to the other,
- - Last but far from least, it's the translation made by most gateways
- that relay mail between BITNET and the Internet, i.e. between EBCDIC mail
- and ASCII mail. Of special importance is that of the encoding of data
- that is to be transmitted by e-mail (UUENCODE, BOO, HQX...): if the
- ASCII-EBCDIC-ASCII translation fails to be invertible, decoding fails.
-
- The requirement #1 of SHARE is that IBM use a single EBCDIC code for
- Latin group 1 and publish it. Using an extension of de-facto EBCDIC is
- recommended.
-
-
-
- Asynchronous communication
-
- Thanks to the interest of Frank da Cruz and Christine Gianone,
- Kermit now defines specifications to support ISO 8859 (and other codes if
- needed) on the communication line in terminal and file transfer mode. It
- has provision to extend to mixed codes files too.
- John Chandler has extended the traditional translation made by his
- remarkable IBM mainframe Kermits to the specific choice of any CECP or
- the extended de-facto EBCDIC to ISO 8859-1.
- The impressive MSDOS Kermit by Joe Doupnik now also supports
- translation of PC code pages to ISO8859-1.
- Thanks to Paul Placeway, Macintosh Kermit now supports ISO 8859-1 as
- an 8-bit line terminal. Others have taken over the job to complete it.
-
- I think I can speak on behalf on the international computing
- community and enthusiastically thank these people for a work most useful
- to them.
-
-
-
- TCP/IP
-
- Despite a mention I have read in an introduction to the TCP/IP
- communication protocols "provision for hosts with different character
- sets", the idea does not extend much into the standards. In fact, some of
- them even restrict text to 7-bit explicitly and without more reason that
- some points of forgotten history. No attempt is made to make a statement
- to standardize what must be an 8-bit code so that it be common to all
- machines, just like ASCII is, as explained above.
-
- In practice, it is often no more than a question of implementation:
- use ISO 8859 as the code of a machine or translate the proprietary code
- to ISO 8859. At the time of writing the first version of this text, just
- EBCDIC mainframes did translate, because the need appeared evident; it
- was restricted to the US ASCII character set, but a simple table change
- extends the scope of all protocols. For international characters users,
- the same problem and solution exists for any host not using ISO 8859. As
- of this writing, the most important applications on the Macintosh have
- applied the principles: Eudora (POP3) by Steve Dorner, Brown tn3270 by
- Peter DiCamillo, Fetch (FTP client) by Jim Matthews, FTPd (FTP server)
- and other programs by Peter Lewis which cope with translation, exactly
- to-day NCSA/BYU/UCL Telnet by Pascal Maes, of course Mac-X from Apple and
- even others still to check. IBM PC, statu quo: just Telnet by IBM (both
- vt100 and tn270) and several other firms. Thanks to the authors!
-
- The idea to translate the data does not come to the mind of the
- persons who write the TCP/IP applications because they don't know the
- problem. If the protocol speaks about it, the application will probably
- be written correctly for that matter. For example, the specifications of
- X-Windows state that ISO 8859-1 is the code that must be used to exchange
- text between the client and the server of that protocol; and all X-
- Windows applications are correct.
-
- So, failing to rewrite most RFCs just for this, what is needed is a
- general TCP/IP statement saying what single code TCP/IP application
- protocols use on communication lines: ISO 8859 with future migration to
- ISO 10646. This would be like adding a minimal presentation layer.
-
- Specific TCP/IP cases.
-
- Telnet. Take the most basic VT100 implementation, treat the keyboard
- as explained above (translating keyboard input to ISO 8859), translate
- ISO to local code before display and you've done it. No need to try to
- negotiate binary (I am told it even hurts and binary has nothing to do in
- my mind with the fact that the text a particular terminal uses is 8-bit).
- Note that anyone afraid of the 8th bit can limit his typing to ASCII; his
- host will not return him anything else and the upgraded program will
- behave exactly like before. Also note that ISO 8859 does not conflict
- with the 8-bit control characters and that using ISO is simplification.
- No need to wonder or negotiate if the host will send them: if any byte in
- the range 80-9F comes in, you may treat it as control.
-
- Tn3270. Like IBM mainframes, it is forced to translate. So, it's
- just a matter of using the correct tables. It will save your time not to
- try to support all the EBCDIC CECPs. Using CECP 1047 will probably make
- everybody happy. However, make the translation customizable. If someone
- wants things differently, it will probably be a whole installation with
- time to customize it.
-
- SMTP. Despite RFC 821 restricts data to 7-bits, it works quite well
- with 8. We use 8-bit mail on Unix (Sun and IBM), on IBM mainframes and on
- Macintosh to the delight of our users. It's just a matter of not crossing
- 8th-bit-stripper gateways. For the Internet, we do not use such hosts as
- less preferred MXes and we expect that sites wanting 8 bits will do so.
- Together with many other sites, we use ISO 8859-1. No problem!
- So, that's just what it is needed for the Internet: kill 8th bit
- killers or don't use them. Other networks should be expected to do so
- with their mail and use the correct gateways with the Internet.
- The BITNET/Internet gateways, for example, should translate between
- ISO 8859-1 and CECP 1047.
-
- The same general rules for translation as explained above for file
- transfer apply to FTP and other protocols. Note that text vs binary is a
- distinction to introduce in additional places, maybe. For example, NFS
- would benefit from using it (and best at the file level).
-
-
-
- General conclusions
-
- 1) Every effort should be made so that all operating systems' codes be
- unique and universal, i. e. ISO 8859-x for an 8-bit code, while waiting
- for the perfect unity of a single multibyte code.
-
- 2) Failing that, communication software must palliate a particular system
- weakness and translate data so that it appears to the outside world to
- use the unique data interchange code.
-
- 3) Programers must deal with 8-bit character codes (and prepare for
- multibytes ones).
-
-
-
- Translation.
-
- I have been looking for constructor-defined or most widely accepted
- complete tables and I explain the reasons of the choices. However, I
- cannot guarantee that another translation will not be used someday. The
- data correspond to my explanations. That's all I can say.
-
-
- DEC
-
- Easy case first. DEC uses ISO 8859-1 (just a few characters of their
- 8-bit code -- pre-dating ISO 8859 -- are different). Nothing to do except
- making sure the 8 bits go through.
-
-
- IBM translations
-
- Since version 1 of this document, IBM has published the following
- "Character Data Representation Architecture" (CDRA) documents:
- GC09-1392-00 Executive Overview
- GC09-1390-00 Level 1, Reference
- GC09-1391-00 Level 1, Registry
- The latter answers most of the former questions about translation.
- IBM has also published a new EBCDIC CECP 1047 that fulfills the
- requirements of compatibility with the previous de-facto EBCDIC. However,
- IBM has made no statement I know about support nor whether this code is
- intended to be the sole one for Latin-1 languages.
-
- In consequence of the SHARE requirement (the necessity to use a
- single compatible code on IBM mainframes), I think with many people that
- only CECP 1047 should be used on EBCDIC mainframes. And, by extension,
- only CP 850 on the PC (but ISO 8859-1 would be better). The PC may also
- use CP 437 (e.g. when 850 is not available) as limited use of a subset of
- the ISO character set. But, even if using CP 437, a PC should use the
- same translation to ISO as for CP 850. Only 4 characters need to
- translate differently and those needing them are expected to use CP 850.
-
- The translation tables listed below are limited to these two codes
- (others are found in a separate file).
- A problem exists with the translation of CECP 850 with ISO. As
- published in the CDRA registry, the translation of the ASCII part is not
- a null translation. This has simply been corrected below. But the IBM
- translation also does not implement round trip integrity with PC to
- EBCDIC translation published and used by IBM products (specifically, 850-
- >500 is not 850->ISO->500). So, this table may be subject to change.
- Unless IBM decide that the wrong table is CECP 1047 with ISO. Unless they
- say nothing and don't mind that they have set their Communication Manager
- wrong. The change would only affect the range 80-AF of the ISO control
- characters, though.
-
- ISO 8859-1 to CECP 1047 (Extended de-facto EBCDIC):
- 00 01 02 03 37 2D 2E 2F 16 05 25 0B 0C 0D 0E 0F
- 10 11 12 13 3C 3D 32 26 18 19 3F 27 1C 1D 1E 1F
- 40 5A 7F 7B 5B 6C 50 7D 4D 5D 5C 4E 6B 60 4B 61
- F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 7A 5E 4C 7E 6E 6F
- 7C C1 C2 C3 C4 C5 C6 C7 C8 C9 D1 D2 D3 D4 D5 D6
- D7 D8 D9 E2 E3 E4 E5 E6 E7 E8 E9 AD E0 BD 5F 6D
- 79 81 82 83 84 85 86 87 88 89 91 92 93 94 95 96
- 97 98 99 A2 A3 A4 A5 A6 A7 A8 A9 C0 4F D0 A1 07
- 20 21 22 23 24 15 06 17 28 29 2A 2B 2C 09 0A 1B
- 30 31 1A 33 34 35 36 08 38 39 3A 3B 04 14 3E FF
- 41 AA 4A B1 9F B2 6A B5 BB B4 9A 8A B0 CA AF BC
- 90 8F EA FA BE A0 B6 B3 9D DA 9B 8B B7 B8 B9 AB
- 64 65 62 66 63 67 9E 68 74 71 72 73 78 75 76 77
- AC 69 ED EE EB EF EC BF 80 FD FE FB FC BA AE 59
- 44 45 42 46 43 47 9C 48 54 51 52 53 58 55 56 57
- 8C 49 CD CE CB CF CC E1 70 DD DE DB DC 8D 8E DF
- inverted,
- CECP 1047 (Extended de-facto EBCDIC) to ISO 8859-1:
- 00 01 02 03 9C 09 86 7F 97 8D 8E 0B 0C 0D 0E 0F
- 10 11 12 13 9D 85 08 87 18 19 92 8F 1C 1D 1E 1F
- 80 81 82 83 84 0A 17 1B 88 89 8A 8B 8C 05 06 07
- 90 91 16 93 94 95 96 04 98 99 9A 9B 14 15 9E 1A
- 20 A0 E2 E4 E0 E1 E3 E5 E7 F1 A2 2E 3C 28 2B 7C
- 26 E9 EA EB E8 ED EE EF EC DF 21 24 2A 29 3B 5E
- 2D 2F C2 C4 C0 C1 C3 C5 C7 D1 A6 2C 25 5F 3E 3F
- F8 C9 CA CB C8 CD CE CF CC 60 3A 23 40 27 3D 22
- D8 61 62 63 64 65 66 67 68 69 AB BB F0 FD FE B1
- B0 6A 6B 6C 6D 6E 6F 70 71 72 AA BA E6 B8 C6 A4
- B5 7E 73 74 75 76 77 78 79 7A A1 BF D0 5B DE AE
- AC A3 A5 B7 A9 A7 B6 BC BD BE DD A8 AF 5D B4 D7
- 7B 41 42 43 44 45 46 47 48 49 AD F4 F6 F2 F3 F5
- 7D 4A 4B 4C 4D 4E 4F 50 51 52 B9 FB FC F9 FA FF
- 5C F7 53 54 55 56 57 58 59 5A B2 D4 D6 D2 D3 D5
- 30 31 32 33 34 35 36 37 38 39 B3 DB DC D9 DA 9F
-
- ISO 8859-1 to IBM PC code page 850:
- 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
- 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
- 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
- 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
- 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
- 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
- 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
- 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
- BA CD C9 BB C8 BC CC B9 CB CA CE DF DC DB FE F2
- B3 C4 DA BF C0 D9 C3 B4 C2 C1 C5 B0 B1 B2 D5 9F
- FF AD BD 9C CF BE DD F5 F9 B8 A6 AE AA F0 A9 EE
- F8 F1 FD FC EF E6 F4 FA F7 FB A7 AF AC AB F3 A8
- B7 B5 B6 C7 8E 8F 92 80 D4 90 D2 D3 DE D6 D7 D8
- D1 A5 E3 E0 E2 E5 99 9E 9D EB E9 EA 9A ED E8 E1
- 85 A0 83 C6 84 86 91 87 8A 82 88 89 8D A1 8C 8B
- D0 A4 95 A2 93 E4 94 F6 9B 97 A3 96 81 EC E7 98
- inverted,
- IBM PC code page 850 to ISO 8859-1:
- 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
- 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
- 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
- 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F
- 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F
- 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F
- 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
- 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F
- C7 FC E9 E2 E4 E0 E5 E7 EA EB E8 EF EE EC C4 C5
- C9 E6 C6 F4 F6 F2 FB F9 FF D6 DC F8 A3 D8 D7 9F
- E1 ED F3 FA F1 D1 AA BA BF AE AC BD BC A1 AB BB
- 9B 9C 9D 90 97 C1 C2 C0 A9 87 80 83 85 A2 A5 93
- 94 99 98 96 91 9A E3 C3 84 82 89 88 86 81 8A A4
- F0 D0 CA CB C8 9E CD CE CF 95 92 8D 8C A6 CC 8B
- D3 DF D4 D2 F5 D5 B5 FE DE DA DB D9 FD DD AF B4
- AD B1 8F BE B6 A7 F7 B8 B0 A8 B7 B9 B3 B2 8E A0
-
-
- Apple Macintosh
-
- Apple Inc. remained silent to the request for an official
- translation table between ISO 8859-1 and the Macintosh code that would
- fulfill the data processing requirement of being invertible for the 256
- code points. So, I built one and suggested that the Kermit repository
- store the data and be the reference for it.
- I made the translation as compatible as possible with an existing
- translation tables, the official "Apple File Exchange" from Apple Inc.
- that translates between IBM PC code and Apple's, hence, indirectly to ISO
- 8859-1. Many characters of the Apple fonts belong to ISO 8859-1 and
- caused no problem. The translation of some characters became
- incompatible, because the "Apple File Exchange" is homographic, which
- fails to be invertible (e. g. 2 superscript translates to plain 2), and
- because the AFE is based on IBM PC 437 that contains some characters of
- the Macintosh set that have been replaced (giving IBM PC code page 850)
- with characters of ISO 8859-1 (for example, it matched Mac Omega to a 437
- Omega that became a 850 U circumflex that now has to match the Mac's F3.)
- Several translations that remained arbitrary were preferred to be
- homographic or mnemonic. Leftovers from the 80-FF Mac range have simply
- be lined up in the 80-9F range of ISO 8859-1 without any particular
- reason.
- This is a second version of the translation; 6 characters of the
- standard Apple code whose translation was arbitrary have been translated
- according to their Islandic replacements (plus change of the translation
- of the Apple code points to which these ISO characters translated).
- Below, you will find comments about the choices (why):
- Blank: compatible with AFE (same in both PC 437 and 850).
- S: not in 437/AFE, but ISO character is in "Standard Apple Character Set"
- E: same for "SACS with extensions" (on newer systems only).
- I: translation according to an Islandic Apple font.
- A: arbitrary (but choice sometimes guided by lookalike or mnemonic
- aspects and a few characters of PC 437 will be preserved).
-
- ISO Mac ISO 8859-1 name (IBM) Why Mac name (Paul Placeway)
- 80 | A5 | | A | bullet
- 81 | AA | | A | trade mark
- 82 | AD | | A | not equal
- 83 | B0 | | A | infinity
- 84 | B3 | | A | greater than or equal to
- 85 | B7 | | A | Uppercase Sigma (Summation)
- 86 | BA | | A | integral
- 87 | BD | | A | Uppercase Omega
- 88 | C3 | | A | radical (square root)
- 89 | C5 | | A | approx equal
- 8A | C9 | | A | elipsis (...)
- 8B | D1 | | A | em dash
- 8C | D4 | | A | left singlequote ( ` )
- 8D | D9 | | A | Y dieresis
- 8E | DA | | A | divide (a / with less slope)
- 8F | B6 | | A | partial
- 90 | C6 | | A | Uppercase Delta
- 91 | CE | | A | OE
- 92 | E2 | | A | baseline single close quote
- 93 | E3 | | A | baseline double close quote
- 94 | E4 | | A | per thousand
- 95 | F0 | | A | (closed) Apple
- 96 | F6 | | A | circumflex
- 97 | F7 | | A | tilde
- 98 | F9 | | A | breve
- 99 | FA | | A | dot accent
- 9A | FB | | A | ring accent
- 9B | FD | | A | Hungarian umlaut
- 9C | FE | | A | ogonek
- 9D | FF | | A | caron
- 9E | F5 | | A | dotless i
- 9F | C4 | | A | florin
- A0 | CA | required space | A | non-printing space
- A1 | C1 | exclamation point inv | | inverted !
- A2 | A2 | cent sign | S | cent
- A3 | A3 | pound sign | | sterling
- A4 | DB | int. currency symbol | E | generic curency
- A5 | B4 | Yen sign | S | yen
- A6 | CF | Vertical Line, Broken | A | oe
- A7 | A4 | section/paragraph symb| S | section
- A8 | AC | diaeresis,umlaut acc | S | dieresis (AKA umlaut)
- A9 | A9 | Copyright sign | | copyright ( (C) )
- AA | BB | ordinal indicator fem | | feminine ordinal
- AB | C7 | left angle quotes | | left guillemot (like << )
- AC | C2 | logical NOT, EOL symb | | logical not
- AD | D0 | Syllabe Hyphen | A | en dash
- AE | A8 | Regist.Trade Mark sym | S | registered ( (R) )
- AF | F8 | overline | A | macron
- B0 | A1 | Degree Symbol | | superscript ring
- B1 | B1 | plus or minus sign | | plus minus
- B2 | D3 | 2 superscript | A | right doublequote ( '' )
- B3 | D2 | 3 superscript | A | left doublequote ( `` )
- B4 | AB | acute accent | S | acute accent
- B5 | B5 | micro symbol | | greek lowercase mu
- B6 | A6 | paragraph symbol USA | S | paragraph
- B7 | E1 | Middle dot accent | E | centered (small) dot
- B8 | FC | cedilla accent | E | cedilla
- B9 | D5 | 1 superscript | A | right singlequote ( ' )
- BA | BC | ordinal indicator mas | | masculine ordinal
- BB | C8 | right angle quotes | | right guillemot (like >> )
- BC | B9 | one quarter | A | lowercase pi
- BD | B8 | one half | A | Uppercase Pi (Power)
- BE | B2 | three quarters | A | less than or equal to
- BF | C0 | Question mark inverted| | inverted ?
- C0 | CB | A grave capital | S | A grave
- C1 | E7 | A acute capital | E | A accute
- C2 | E5 | A circumflex capital | E | A circumflex
- C3 | CC | A tilde capital | S | A tilde
- C4 | 80 | A diaeresis capital | | A dieresis
- C5 | 81 | A overcircle capital | | A ring
- C6 | AE | AE diphthong capital | | AE
- C7 | 82 | C cedilla capital | | C cedilla
- C8 | E9 | E grave capital | E | E grave
- C9 | 83 | E acute capital | | E accute
- CA | E6 | E circumflex capital | S | E circumflex
- CB | E8 | E diaeresis capital | E | E dieresis
- CC | ED | I grave capital | E | I grave
- CD | EA | I acute capital | E | I accute
- CE | EB | I circumflex capital | E | I circumflex
- CF | EC | I diaeresis capital | E | I dieresis
- D0 | DC | Eth islandic capital | I | < or Eth islandic capital
- D1 | 84 | N tilde capital | | N tilde
- D2 | F1 | O grave capital | E | O grave
- D3 | EE | O acute capital | E | O accute
- D4 | EF | O circumflex capital | E | O circumflex
- D5 | CD | O tilde capital | S | O tilde
- D6 | 85 | O diaeresis capital | | O dieresis
- D7 | D7 | Multiply sign | A | lozenge (open diamond)
- D8 | AF | O slash capital | E | O slash
- D9 | F4 | U grave capital | E | U grave
- DA | F2 | U acute capital | E | U accute
- DB | F3 | U circumflex capital | E | U circumflex
- DC | 86 | U diaeresis capital | | U dieresis
- DD | A0 | Y acute Capital | I | dagger or Y acute Capital
- DE | DE | Thorn islandic capital| I | fi or Thorn islandic capital
- DF | A7 | sharp s small | | Es-sed (German double s)
- E0 | 88 | a grave small | | a grave
- E1 | 87 | a acute small | | a accute
- E2 | 89 | a circumflex small | | a circumflex
- E3 | 8B | a tilde small | S | a tilde
- E4 | 8A | a diaeresis small | | a dieresis
- E5 | 8C | a overcircle small | | a ring
- E6 | BE | ae diphthong small | | ae
- E7 | 8D | c cedilla small | | c cedilla
- E8 | 8F | e grave small | | e grave
- E9 | 8E | e acute small | | e accute
- EA | 90 | e circumflex small | | e circumflex
- EB | 91 | e diaeresis small | | e dieresis
- EC | 93 | i grave small | | i grave
- ED | 92 | i acute small | | i accute
- EE | 94 | i circumflex small | | i circumflex
- EF | 95 | i diaeresis small | | i dieresis
- F0 | DD | Eth Islandic small | I | > or Eth Islandic small
- F1 | 96 | n tilde small | | n tilde
- F2 | 98 | o grave small | | o grave
- F3 | 97 | o acute small | | o accute
- F4 | 99 | o circumflex small | | o circumflex
- F5 | 9B | o tilde small | S | o tilde
- F6 | 9A | o diaeresis small | | o dieresis
- F7 | D6 | Divide sign | | divide
- F8 | BF | o slash small | S | o slash
- F9 | 9D | u grave small | | u grave
- FA | 9C | u acute small | | u accute
- FB | 9E | u circumflex small | | u circumflex
- FC | 9F | u diaeresis small | | u dieresis
- FD | E0 | y acute small | I | double dagger of y acute small
- FE | DF | Thorn islandic small | I | fl or Thorn islandic small
- FF | D8 | y diaeresis small | | y dieresis
-
- data 'taBL' (1001, "Translate In", purgeable) {
- /* Translation from ISO 8859-1 to Macintosh extended code */
- /* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF */
- /*0x*/ $"00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F"
- /*1x*/ $"10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F"
- /*2x*/ $"20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F"
- /*3x*/ $"30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F"
- /*4x*/ $"40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F"
- /*5x*/ $"50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F"
- /*6x*/ $"60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F"
- /*7x*/ $"70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F"
- /*8x*/ $"A5 AA AD B0 B3 B7 BA BD C3 C5 C9 D1 D4 D9 DA B6"
- /*9x*/ $"C6 CE E2 E3 E4 F0 F6 F7 F9 FA FB FD FE FF F5 C4"
- /*Ax*/ $"CA C1 A2 A3 DB B4 CF A4 AC A9 BB C7 C2 D0 A8 F8"
- /*Bx*/ $"A1 B1 D3 D2 AB B5 A6 E1 FC D5 BC C8 B9 B8 B2 C0"
- /*Cx*/ $"CB E7 E5 CC 80 81 AE 82 E9 83 E6 E8 ED EA EB EC"
- /*Dx*/ $"DC 84 F1 EE EF CD 85 D7 AF F4 F2 F3 86 A0 DE A7"
- /*Ex*/ $"88 87 89 8B 8A 8C BE 8D 8F 8E 90 91 93 92 94 95"
- /*Fx*/ $"DD 96 98 97 99 9B 9A D6 BF 9D 9C 9E 9F E0 DF D8"
- };
-
- data 'taBL' (1002, "Translate Out", purgeable) {
- /* Translation from Macintosh extended code to ISO 8859-1 */
- /* x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF */
- /*0x*/ $"00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F"
- /*1x*/ $"10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F"
- /*2x*/ $"20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F"
- /*3x*/ $"30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F"
- /*4x*/ $"40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F"
- /*5x*/ $"50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F"
- /*6x*/ $"60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F"
- /*7x*/ $"70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F"
- /*8x*/ $"C4 C5 C7 C9 D1 D6 DC E1 E0 E2 E4 E3 E5 E7 E9 E8"
- /*9x*/ $"EA EB ED EC EE EF F1 F3 F2 F4 F6 F5 FA F9 FB FC"
- /*Ax*/ $"DD B0 A2 A3 A7 80 B6 DF AE A9 81 B4 A8 82 C6 D8"
- /*Bx*/ $"83 B1 BE 84 A5 B5 8F 85 BD BC 86 AA BA 87 E6 F8"
- /*Cx*/ $"BF A1 AC 88 9F 89 90 AB BB 8A A0 C0 C3 D5 91 A6"
- /*Dx*/ $"AD 8B B3 B2 8C B9 F7 D7 FF 8D 8E A4 D0 F0 DE FE"
- /*Ex*/ $"FD B7 92 93 94 C2 CA C1 CB C8 CD CE CF CC D3 D4"
- /*Fx*/ $"95 D2 DA DB D9 9E 96 97 AF 98 99 9A B8 9B 9C 9D"
- };
-
-
-
- ISO 8859-1
-
- Here is a names list and graphic representation of the ISO 8859-1
- code. The well-known ASCII part and control characters have been left out
- to shorten the text. They are included for practical programming help
- only. In particular, the "bitmaps" are nothing official. For convenience,
- two lists of names and acronyms are given: the first comes from IBM, the
- second from a list of characters of the standard IS0 6937.
-
- Code point in hexadecimal / Acronym / Name. Origin: IBM.
-
- A0 | SP30 | required space D0 | LD62 | Eth islandic capital
- A1 | SP03 | exclamation point inv D1 | LN20 | N tilde capital
- A2 | SC04 | cent sign D2 | LO14 | O grave capital
- A3 | SC02 | pound sign D3 | LO12 | O acute capital
- A4 | SC01 | int. currency symbol D4 | LO16 | O circumflex capital
- A5 | SC05 | Yen sign D5 | LO20 | O tilde capital
- A6 | SM65 | Vertical Line, Broken D6 | LO18 | O diaeresis capital
- A7 | SM24 | section/paragraph symb D7 | SA07 | Multiply sign
- A8 | SD17 | diaeresis,umlaut acc D8 | LO62 | O slash capital
- A9 | SM52 | Copyright sign D9 | LU14 | U grave capital
- AA | SM21 | ordinal indicator fem DA | LU12 | U acute capital
- AB | SP17 | left angle quotes DB | LU16 | U circumflex capital
- AC | SM66 | logical NOT, EOL symb DC | LU18 | U diaeresis capital
- AD | SP32 | Syllabe Hyphen DD | LY12 | Y acute Capital
- AE | SM53 | Regist.Trade Mark sym DE | LT64 | Thorn islandic capital
- AF | SM15 | overline DF | LS61 | sharp s small
- B0 | SM19 | Degree Symbol E0 | LA13 | a grave small
- B1 | SA02 | plus or minus sign E1 | LA11 | a acute small
- B2 | ND021| 2 superscript E2 | LA15 | a circumflex small
- B3 | ND031| 3 superscript E3 | LA19 | a tilde small
- B4 | SD11 | acute accent E4 | LA17 | a diaeresis small
- B5 | SM17 | micro symbol E5 | LA27 | a overcircle small
- B6 | SM25 | paragraph symbol USA E6 | LA51 | ae diphthong small
- B7 | SD63 | Middle dot accent E7 | LC41 | c cedilla small
- B8 | SD41 | cedilla accent E8 | LE13 | e grave small
- B9 | ND011| 1 superscript E9 | LE11 | e acute small
- BA | SM20 | ordinal indicator mas EA | LE15 | e circumflex small
- BB | SP18 | right angle quotes EB | LE17 | e diaeresis small
- BC | NF04 | one quarter EC | LI13 | i grave small
- BD | NF01 | one half ED | LI11 | i acute small
- BE | NF05 | three quarters EE | LI15 | i circumflex small
- BF | SP16 | Question mark inverted EF | LI17 | i diaeresis small
- C0 | LA14 | A grave capital F0 | LD63 | Eth Islandic small
- C1 | LA12 | A acute capital F1 | LN19 | n tilde small
- C2 | LA16 | A circumflex capital F2 | LO13 | o grave small
- C3 | LA20 | A tilde capital F3 | LO11 | o acute small
- C4 | LA18 | A diaeresis capital F4 | LO15 | o circumflex small
- C5 | LA28 | A overcircle capital F5 | LO19 | o tilde small
- C6 | LA52 | AE diphthong capital F6 | LO17 | o diaeresis small
- C7 | LC42 | C cedilla capital F7 | SA06 | Divide sign
- C8 | LE14 | E grave capital F8 | LO61 | o slash small
- C9 | LE12 | E acute capital F9 | LU13 | u grave small
- CA | LE16 | E circumflex capital FA | LU11 | u acute small
- CB | LE18 | E diaeresis capital FB | LU15 | u circumflex small
- CC | LI14 | I grave capital FC | LU17 | u diaeresis small
- CD | LI12 | I acute capital FD | LY11 | y acute small
- CE | LI16 | I circumflex capital FE | LT63 | Thorn islandic small
- CF | LI18 | I diaeresis capital FF | LY17 | y diaeresis small
-
- Names and slightly different acronyms from the ISO 6937 repertoire
-
- A0 SP31 NO-BREAK SPACE
- A1 SP03 INVERTED EXCLAMATION MARK
- A2 SC04 CENT SIGN
- A3 SC02 POUND SIGN
- A4 SC01 CURRENCY SIGN
- A5 SC05 YEN SIGN
- A6 SM65 BROKEN BAR
- A7 SM24 PARAGRAPH SIGN
- A8 SD17 DIAERESIS
- A9 SM52 COPYRIGHT SIGN
- AA SM21 FEMININE ORDINAL INDICATOR
- AB SP17 LEFT POINTING DOUBLE ANGLE QUOTATION MARK
- AC SM66 NOT SIGN
- AD SP32 SOFT HYPHEN
- AE SM53 REGISTERED TRADE MARK SIGN
- AF SD31 MACRON
- B0 SM19 DEGREE SIGN
- B1 SA02 PLUS-MINUS SIGN
- B2 NS02 SUPERSCRIPT TWO
- B3 NS03 SUPERSCRIPT THREE
- B4 SD11 ACUTE ACCENT
- B5 SM17 MICRO SIGN
- B6 SM25 PILCHROW SIGN
- B7 SM26 MIDDLE DOT
- B8 SD41 CEDILLA
- B9 NS01 SUPERSCRIPT ONE
- BA SM20 MASCULINE ORDINAL INDICATOR
- BB SP18 RIGHT POINTING DOUBLE ANGLE QUOTATION MARK
- BC NF04 VULGAR FRACTION ONE-QUARTER
- BD NF01 VULGAR FRACTION ONE-HALF
- BE NF05 VULGAR FRACTION THREE-QUARTERS
- BF SP16 INVERTED QUESTION MARK
- C0 LA14 LATIN CAPITAL LETTER A WITH GRAVE ACCENT
- C1 LA12 LATIN CAPITAL LETTER A WITH ACUTE ACCENT
- C2 LA16 LATIN CAPITAL LETTER A WITH CIRCUMFLEX ACCENT
- C3 LA20 LATIN CAPITAL LETTER A WITH TILDE
- C4 LA18 LATIN CAPITAL LETTER A WITH DIAERESIS
- C5 LA28 LATIN CAPITAL LETTER A WITH RING ABOVE
- C6 LA52 LATIN CAPITAL LIGATURE AE
- C7 LC42 LATIN CAPITAL LETTER C WITH CEDILLA
- C8 LE14 LATIN CAPITAL LETTER E WITH GRAVE ACCENT
- C9 LE12 LATIN CAPITAL LETTER E WITH ACUTE ACCENT
- CA LE16 LATIN CAPITAL LETTER E WITH CIRCUMFLEX ACCENT
- CB LE18 LATIN CAPITAL LETTER E WITH DIAERESIS
- CC LI14 LATIN CAPITAL LETTER I WITH GRAVE ACCENT
- CD LI12 LATIN CAPITAL LETTER I WITH ACUTE ACCENT
- CE LI16 LATIN CAPITAL LETTER I WITH CIRCUMFLEX ACCENT
- CF LI18 LATIN CAPITAL LETTER I WITH DIAERESIS
- D0 LD62 LATIN CAPITAL LETTER D WITH STROKE
- D1 LN20 LATIN CAPITAL LETTER N WITH TILDE
- D2 LO14 LATIN CAPITAL LETTER O WITH GRAVE ACCENT
- D3 LO12 LATIN CAPITAL LETTER O WITH ACUTE ACCENT
- D4 LO16 LATIN CAPITAL LETTER O WITH CIRCUMFLEX ACCENT
- D5 LO20 LATIN CAPITAL LETTER O WITH TILDE
- D6 LO18 LATIN CAPITAL LETTER O WITH DIAERESIS
- D7 SA07 MULTIPLICATION SIGN
- D8 LO62 LATIN CAPITAL LETTER O WITH OBLIQUE STROKE
- D9 LU14 LATIN CAPITAL LETTER U WITH GRAVE ACCENT
- DA LU12 LATIN CAPITAL LETTER U WITH ACUTE ACCENT
- DB LU16 LATIN CAPITAL LETTER U WITH CIRCUMFLEX ACCENT
- DC LU18 LATIN CAPITAL LETTER U WITH DIAERESIS
- DD LY12 LATIN CAPITAL LETTER Y WITH ACUTE ACCENT
- DE LT64 LATIN CAPITAL LETTER ICELANDIC THORN
- DF LS61 LATIN SMALL LETTER GERMAN SHARP S
- E0 LA13 LATIN SMALL LETTER A WITH GRAVE ACCENT
- E1 LA11 LATIN SMALL LETTER A WITH ACUTE ACCENT
- E2 LA15 LATIN SMALL LETTER A WITH CIRCUMFLEX ACCENT
- E3 LA19 LATIN SMALL LETTER A WITH TILDE
- E4 LA17 LATIN SMALL LETTER A WITH DIAERESIS
- E5 LA27 LATIN SMALL LETTER A WITH RING ABOVE
- E6 LA51 LATIN SMALL LIGATURE AE
- E7 LC41 LATIN SMALL LETTER C WITH CEDILLA
- E8 LE13 LATIN SMALL LETTER E WITH GRAVE ACCENT
- E9 LE11 LATIN SMALL LETTER E WITH ACUTE ACCENT
- EA LE15 LATIN SMALL LETTER E WITH CIRCUMFLEX ACCENT
- EB LE17 LATIN SMALL LETTER E WITH DIAERESIS
- EC LI13 LATIN SMALL LETTER I WITH GRAVE ACCENT
- ED LI11 LATIN SMALL LETTER I WITH ACUTE ACCENT
- EE LI15 LATIN SMALL LETTER I WITH CIRCUMFLEX ACCENT
- EF LI17 LATIN SMALL LETTER I WITH DIAERESIS
- F0 LD63 LATIN SMALL LETTER ICELANDIC ETH
- F1 LN19 LATIN SMALL LETTER N WITH TILDE
- F2 LO13 LATIN SMALL LETTER O WITH GRAVE ACCENT
- F3 LO11 LATIN SMALL LETTER O WITH ACUTE ACCENT
- F4 LO15 LATIN SMALL LETTER O WITH CIRCUMFLEX ACCENT
- F5 LO19 LATIN SMALL LETTER O WITH TILDE
- F6 LO17 LATIN SMALL LETTER O WITH DIAERESIS
- F7 SA06 DIVISION SIGN
- F8 LO61 LATIN SMALL LETTER O WITH OBLIQUE STROKE
- F9 LU13 LATIN SMALL LETTER U WITH GRAVE ACCENT
- FA LU11 LATIN SMALL LETTER U WITH ACUTE ACCENT
- FB LU15 LATIN SMALL LETTER U WITH CIRCUMFLEX ACCENT
- FC LU17 LATIN SMALL LETTER U WITH DIAERESIS
- FD LY11 LATIN SMALL LETTER Y WITH ACUTE ACCENT
- FE LT63 LATIN SMALL LETTER ICELANDIC THORN
- FF LY17 LATIN SMALL LETTER Y WITH DIAERESIS
-
- ISO 8859-1 by [coarse, bandwith saving] pictures
-
- -------------------------------------------------------------------------
- | A0 | A1 | A2 | A3 | A4 | A5 | A6 | A7 |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- | | XX | XX | XXX | | XX XX | XX | XXXXX |
- | | | XX | XX XX |XX XX | XX XX | XX | XX X|
- | | XX | XXXXXX | XX X | XXXXX | XXXX | XX | XXXX |
- | | XX |XX |XXXX |XX XX | XXXXXX | | XX XX |
- | | XXXX |XX | XX |XX XX | XX | | XX XX |
- | | XXXX | XXXXXX | XX XX | XXXXX | XXXXXX | XX | XXXX |
- | | XX | XX |XXXXXX |XX XX | XX | XX |X XX |
- | | | XX | | | XX | XX | XXXXX |
- -------------------------------------------------------------------------
-
- -------------------------------------------------------------------------
- | A8 | A9 | AA | AB | AC | AD | AE | AF |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- | | XXXXXX | XXXX | | | | XXXXXX |XXXXXXXX|
- |XX XX |X X| XX XX | XX XX| | |X X| |
- | |X XXX X| XX XX | XX XX | | |X XXX X| |
- | |X X X| XXXXX |XX XX |XXXXXXX | XXXXXX |X X X X| |
- | |X X X| | XX XX | XX | |X XXX X| |
- | |X XXX X| XXXXXX | XX XX| XX | |X X X X| |
- | |X X| | | | |X X| |
- | | XXXXXX | | | | | XXXXXX | |
- -------------------------------------------------------------------------
-
- -------------------------------------------------------------------------
- | B0 | B1 | B2 | B3 | B4 | B5 | B6 | B7 |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- | XXX | XX | XXXX | XXXX | XX | | XXXXXXX| |
- | XX XX | XX | XX | XX | XX | |XX XX XX| |
- | XX XX | XXXXXX | XX | XXX | XX | XX XX |XX XX XX| |
- | XXX | XX | XX | XX | | XX XX | XXXX XX| XX |
- | | XX | XXXXX | XXXX | | XX XX | XX XX| |
- | | | | | | XX XX | XX XX| |
- | | XXXXXX | | | | XXXXX | XX XX| |
- | | | | | |XX | | |
- -------------------------------------------------------------------------
-
- -------------------------------------------------------------------------
- | B8 | B9 | BA | BB | BC | BD | BE | BF |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- | | XX | XXX | | XX XX| XX XX|XXX X| XX |
- | | XXX | XX XX |XX XX |XXX XX |XXX XX | XX X | |
- | | XX | XX XX | XX XX | XX XX | XX XX |XXX X | XX |
- | | XX | XXX | XX XX| XXXX X | XXXXXX | XXX X | XX |
- | | XXXX | | XX XX | XX XX | XX XX|XXXX XX | XX |
- | XX | | XXXXX |XX XX | XX X X | XX XX | X X X | XX XX|
- | XX | | | |XX XXXXX|XX XX | X XXXXX| XXXXX |
- | XXX | | | | XX | XXXX|X XX | |
- -------------------------------------------------------------------------
-
- -------------------------------------------------------------------------
- | C0 | C1 | C2 | C3 | C4 | C5 | C6 | C7 |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- | XX | XX | XXXXX | XXX XX |XX XX | XXX | XXXXX | XXXXX |
- | XX | XX |X X |XX XXX | XXX | XX XX | XX XX |XX XX |
- | XXX | XXX | XXX | XXX | XX XX | XXXXX |XX XX |XX |
- | XX XX | XX XX | XX XX | XX XX |XX XX |XX XX |XXXXXXX |XX |
- |XX XX |XX XX |XX XX |XX XX |XXXXXXX |XXXXXXX |XX XX |XX XX |
- |XXXXXXX |XXXXXXX |XXXXXXX |XXXXXXX |XX XX |XX XX |XX XX | XXXXX |
- |XX XX |XX XX |XX XX |XX XX |XX XX |XX XX |XX XXX | XX |
- | | | | | | | | XXXX |
- -------------------------------------------------------------------------
-
- -------------------------------------------------------------------------
- | C8 | C9 | CA | CB | CC | CD | CE | CF |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- | XX | XX | XXXXX |XX XX | XX | XX | XXXX | XX XX |
- | XX | XX |X X | | XX | XX | X X | |
- |XXXXXXX |XXXXXXX |XXXXXXX |XXXXXXX | XXXX | XXXX | XXXX | XXXX |
- |XX |XX |XX |XX | XX | XX | XX | XX |
- |XXXXXX |XXXXX |XXXXXX |XXXXXX | XX | XX | XX | XX |
- |XX |XX |XX |XX | XX | XX | XX | XX |
- |XXXXXXX |XXXXXXX |XXXXXXX |XXXXXXX | XXXX | XXXX | XXXX | XXXX |
- | | | | | | | | |
- -------------------------------------------------------------------------
-
- -------------------------------------------------------------------------
- | D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- |XXXXX | XXX XX | XX | XX | XXXXX | XXX XX |XX XX | |
- | XX XX |XX XXX | XX | XX |X X |XX XXX | XXX |XX XX |
- | XX XX | | XXX | XXX | XXX | XXX | XX XX | XX XX |
- |XXXX XX |XXX XX | XX XX | XX XX | XX XX | XX XX |XX XX | XXX |
- | XX XX |XXXX XX |XX XX |XX XX |XX XX |XX XX |XX XX | XX XX |
- | XX XX |XX XXXX | XX XX | XX XX | XX XX | XX XX | XX XX |XX XX |
- |XXXXX |XX XXX | XXX | XXX | XXX | XXX | XXX | |
- | | | | | | | | |
- -------------------------------------------------------------------------
-
- -------------------------------------------------------------------------
- | D8 | D9 | DA | DB | DC | DD | DE | DF |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- | XXX X | XX | XX | XXXXX |XX XX | XX |XXXX | XXXX |
- | XX XX | XX | XX |X X | | XX | XX |XX XX |
- |XX XXX |XX XX |XX XX | |XX XX | XX XX | XXXXX |XX XX |
- |XX X XX |XX XX |XX XX |XX XX |XX XX | XX XX | XX XX |XX XX |
- |XXX XX |XX XX |XX XX |XX XX |XX XX | XXXX | XXXXX |XX XX |
- | XX XX |XX XX |XX XX |XX XX |XX XX | XX | XX |XX XX |
- |X XXX | XXXXX | XXXXX | XXXXX | XXXXX | XXXX |XXXX |XX XX |
- | | | | | | | | |
- -------------------------------------------------------------------------
-
- -------------------------------------------------------------------------
- | E0 | E1 | E2 | E3 | E4 | E5 | E6 | E7 |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- | XX | XX | XXXXX | XXX XX |XX XX | XX | | |
- | XX | XX |X X |XX XXX | | XX | | |
- | XXXX | XXXX | XXXX | XXXXX | XXXX | XXXX | XXXXXX | XXXXXX |
- | XX | XX | XX | XX | XX | XX | X X |XX |
- | XXXXX | XXXXX | XXXXX | XXXXXX | XXXXX | XXXXX |XXXXXXX |XX |
- |XX XX |XX XX |XX XX |XX XX |XX XX |XX XX |X X | XXXXXX |
- | XXX XX | XXX XX | XXX XX | XXXXXX | XXX XX | XXX XX |XXXXXXX | XX |
- | | | | | | | | XXX |
- -------------------------------------------------------------------------
-
- -------------------------------------------------------------------------
- | E8 | E9 | EA | EB | EC | ED | EE | EF |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- | XX | XX | XXXXX |XX XX | XX | XX | XXXXX | XX XX |
- | XX | XX |X X | | XX | XX |X X | |
- | XXXXX | XXXXX | XXXXX | XXXXX | | | XXX | XXX |
- |XX XX |XX XX |XX XX |XX XX | XXX | XXX | XX | XX |
- |XXXXXXX |XXXXXXX |XXXXXXX |XXXXXXX | XX | XX | XX | XX |
- |XX |XX |XX |XX | XX | XX | XX | XX |
- | XXXXX | XXXXX | XXXXX | XXXXX | XXXX | XXXX | XXXX | XXXX |
- | | | | | | | | |
- -------------------------------------------------------------------------
-
- -------------------------------------------------------------------------
- | F0 | F1 | F2 | F3 | F4 | F5 | F6 | F7 |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- | XX | XXX XX | XX | XX | XXXXX | XXX XX |XX XX | |
- | XXXXXX |XX XXX | XX | XX |X X |XX XXX | | XX |
- | XX | | XXXXX | XXXXX | XXXXX | XXXXX | XXXXX | |
- | XXXXX |XX XXX |XX XX |XX XX |XX XX |XX XX |XX XX | XXXXXX |
- |XX XX | XX XX |XX XX |XX XX |XX XX |XX XX |XX XX | |
- |XX XX | XX XX |XX XX |XX XX |XX XX |XX XX |XX XX | XX |
- | XXXX | XX XX | XXXXX | XXXXX | XXXXX | XXXXX | XXXXX | |
- | | | | | | | | |
- -------------------------------------------------------------------------
-
- -------------------------------------------------------------------------
- | F8 | F9 | FA | FB | FC | FD | FE | FF |
- |--------|--------|--------|--------|--------|--------|--------|--------|
- | | XX | XX | XXXX |XX XX | XX |XXX |XX XX |
- | X | XX | XX |X X | | XX | XX | |
- | XXXXX |XX XX |XX XX | |XX XX |XX XX | XXXXX |XX XX |
- |XX XXX |XX XX |XX XX |XX XX |XX XX |XX XX | XX XX |XX XX |
- |XX X XX |XX XX |XX XX |XX XX |XX XX |XX XX | XX XX |XX XX |
- |XXX XX |XX XX |XX XX |XX XX |XX XX | XXXXXX | XXXXX | XXXXXX |
- | XXXXX | XXX XX | XXX XX | XXX XX | XXX XX | XX | XX | XX |
- |X | | | | |XXXXXX |XXXX |XXXXXX |
- -------------------------------------------------------------------------
-
- Andr'e PIRARD
- SEGI Univ. de Li`ege
- B26 - Sart Tilman
- B-4000 Li`ege 1 (Belgium)
- PIRARD@BLIULG11 on EARN alias BITNET
- pirard@vm1.ulg.ac.be on Internet
-